Generalized Multi-view Embedding for Visual Recognition and Cross-modal Retrieval

نویسندگان

  • Guanqun Cao
  • Alexandros Iosifidis
  • Ke Chen
  • Moncef Gabbouj
چکیده

In this paper, the problem of multi-view embedding from different visual cues and modalities is considered. We propose a unified solution for subspace learning methods using the Rayleigh quotient, which is extensible for multiple views, supervised learning, and nonlinear embeddings. Numerous methods including canonical correlation analysis, partial least square regression, and linear discriminant analysis are studied using specific intrinsic and penalty graphs within the same framework. Nonlinear extensions based on kernels and (deep) neural networks are derived, achieving better performance than the linear ones. Moreover, a novel multi-view modular discriminant analysis is proposed by taking the view difference into consideration. We demonstrate the effectiveness of the proposed multi-view embedding methods on visual object recognition and cross-modal image retrieval, and obtain superior results in both applications compared to related methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. Learning appropriate representations for multi-modal data is crucial for the cross-modal retrieval performance. Unlike existing image-text retrieval approaches that embed image-text pairs as single feature vectors in a common representational space, we propose ...

متن کامل

Steganography Scheme Based on Reed-Muller Code with Improving Payload and Ability to Retrieval of Destroyed Data for Digital Images

In this paper, a new steganography scheme with high embedding payload and good visual quality is presented. Before embedding process, secret information is encoded as block using Reed-Muller error correction code. After data encoding and embedding into the low-order bits of host image, modulus function is used to increase visual quality of stego image. Since the proposed method is able to embed...

متن کامل

Objects that Sound

In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the audio signal. We achieve both these objectives by training from unlabelled video using only audio-visual correspondence (AVC) as the objective function. This ...

متن کامل

Learning from Multiple Views of Data

Title of dissertation: LEARNING FROM MULTIPLE VIEWS OF DATA Abhishek Sharma, Doctor of Philosophy, 2015 Proposal directed by: Professor David W. Jacobs Department of Computer Science This dissertation takes inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. It explores the hypothes...

متن کامل

Assessment of learning style based on VARK model among the students of Qom University of Medical Sciences

Introduction: Learning is a dominant phenomenon in human life. Learners are different from each other in terms of attitudes and cognitive styles which effect on the learning of people. In this connection, VARK learning style assess the students base their individual abilities and method for obtaining much information from environment in dimensions of visual, aural, read/write, and kinesthetic. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE transactions on cybernetics

دوره   شماره 

صفحات  -

تاریخ انتشار 2017